-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement block prefiltering for Join
s of a lazy result + IndexScan
#1647
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1647 +/- ##
==========================================
+ Coverage 89.53% 89.57% +0.04%
==========================================
Files 381 381
Lines 36649 36792 +143
Branches 4146 4170 +24
==========================================
+ Hits 32813 32957 +144
- Misses 2520 2522 +2
+ Partials 1316 1313 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first somewhat thorough pass on the code (not yet the tests).
I think I still found a bug, if you agree, fix it, and add a test that proves who of us was right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments.
rti.numRows_ = metadata.numElementsYielded_; | ||
rti.totalTime_ = metadata.blockingTime_; | ||
rti.addDetail("num-blocks-read", metadata.numBlocksRead_); | ||
rti.addDetail("num-blocks-all", metadata.numBlocksAll_); | ||
rti.addDetail("num-elements-read", metadata.numElementsRead_); | ||
|
||
// Add more details, but only if the respective value is non-zero. | ||
auto updateIfPositive = [&rti](const auto& value, const std::string& key) { | ||
if (value > 0) { | ||
rti.addDetail(key, value); | ||
} | ||
}; | ||
updateIfPositive(metadata.numBlocksSkippedBecauseOfGraph_, | ||
"num-blocks-skipped-graph"); | ||
updateIfPositive(metadata.numBlocksPostprocessed_, | ||
"num-blocks-postprocessed"); | ||
updateIfPositive(metadata.numBlocksWithUpdate_, "num-blocks-with-update"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this part also a member function of the LazyScanMetadata
?
This is then better if we add additional members to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an abstraction that can be moved into LazyScanMetadata
, but the part that calls updateRuntimeInformationWhenOptimizedOut
has to be part of a class related to Operation
and the compressed relation reader class currently does not include RuntimeInformation
, which means a new header would need to be added that isn't really related to this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't feel that you have the time right now, limit yourself to the comments that are important for the main functionality of this PR.
The real solution here is that neither the LazyScanMetadata
nor the CompressedBlockMetadata
should be part of the large CompressedRelation.h
file.
src/engine/IndexScan.cpp
Outdated
Result::Generator generator_; | ||
// The column index of the join column in the tables yielded by the generator. | ||
ColumnIndex joinColumn_; | ||
// Metadata and blocks of this index scan. | ||
Permutation::MetadataAndBlocks metaBlocks_; | ||
// The iterator of the generator that is currently being consumed. | ||
std::optional<Result::Generator::iterator> iterator_ = std::nullopt; | ||
// The index of the last matching block that was found using the join column. | ||
std::optional<size_t> lastBlockIndex_ = std::nullopt; | ||
// Values returned by the generator that have not been re-yielded yet. | ||
std::deque<Result::IdTableVocabPair> prefetchedValues_{}; | ||
// Metadata of blocks that still need to be read. | ||
std::vector<CompressedBlockMetadata> pendingBlocks_{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reorder them, s.t. the :
- Generator
- iterator
- joinColumn
- prefetchedValues_
- pendingBlocks
- lastBlockIndex
- the rest.
Then they are grouped correctly and easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order is currently mainly a result of moving all non default initialized values to the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, in that case, for such an complex struct, write an explicit constructor which takes the arguments that you need and then you can order them semantically logic.
@@ -29,24 +29,14 @@ class Join : public Operation { | |||
vector<float> _multiplicities; | |||
|
|||
public: | |||
// `allowSwappingChildrenOnlyForTesting` should only ever be changed by tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard is it to get rid of the tests that require this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very hard. The other constructor with the "only for testing" tag was previously used, but I had to change it to make it work with the new function signatures to the point where it pretty much ended up like a second constructor that does almost the same.
if (allowSwappingChildrenOnlyForTesting) { | ||
std::swap(t1, t2); | ||
std::swap(t1JoinCol, t2JoinCol); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests does this break, and can they be fixed as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a super strong opinion here, you can leave it for now, s.t. this
becomes mergeable in a finite amount of time:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that use the "join lambda", see test/util/JoinHelpers.h
Conformance check passed ✅No test result changes. |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
IndexScan
Join
s of a lazy result + IndexScan
This fixes a bug introduced in #1647, which led to a failure of ` AD_CORRECTNESS_CHECK(!prefetchedValues.empty() || innerState->doneFetching_);` in `IndexScan::createPrefilteredJoinSide`.
With this PR, a join between a lazy intermediate result and an
IndexScan
prefilters the blocks of theIndexScan
before reading them. Before this PR, this prefilter was only applied to joins between twoIndexScan
s and between anIndexScan
and a fully materialized intermediate result.